Convolutional Neural Networks (CNN)
Table of Contents
Filter (or Kernel)
Filtering includes smoothing, sharpening and edge enhancement
Discrete convolution can be viewed as element-wise multiplication by a matrix
How to find the right Kernels
We learn many different kernels that make specific effect on images
Let’s apply an opposite approach
We are not designing the kernel, but are learning the kernel from data
Can learn feature extractor from data using a deep learning framework
ANN structure for object detecion in image
%%html
<center><iframe src="https://www.youtube.com/embed/0Hr5YwUUhr0?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
Convolution of CNN
(Deep) Artificial Neural Networks
(Deep) Convolutional Neural Networks
%%html
<center><iframe src="https://www.youtube.com/embed/ISHGyvsT0QY?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/W4xtf8LTz1c?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/FG7M9tWH2nQ?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/utOv-BKI_vo?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
ConvNet performs better with the same number of parameters, due to its use of a prior knowledge about iamges.
%%html
<center><iframe src="https://www.youtube.com/embed/z6k_RMKExlQ?start=5150&end=6132?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
# Import Library
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Convolution layers
Fully connected layers
# input layer
input_h = 28 # Input height
input_w = 28 # Input width
input_ch = 1 # Input channel : Gray scale
# (None, 28, 28, 1)
# First convolution layer
k1_h = 3
k1_w = 3
k1_ch = 32
p1_h = 2
p1_w = 2
# (None, 14, 14 ,32)
# Second convolution layer
k2_h = 3
k2_w = 3
k2_ch = 64
p2_h = 2
p2_w = 2
# (None, 7, 7 ,64)
## Fully connected: flatten the features -> (None, 7*7*64)
conv_result_size = int(input_h/(p1_h*p2_h)) * int(input_w/(p1_w*p2_w)) * k2_ch
n_hidden = 100
n_output = 10
# kernel size: [kernel_height, kernel_width, input_ch, output_ch]
weights = {
'conv1' : tf.Variable(tf.random_normal([k1_h, k1_w, input_ch, k1_ch], stddev = 0.1)),
'conv2' : tf.Variable(tf.random_normal([k2_h, k2_w, k1_ch, k2_ch], stddev = 0.1)),
'hidden' : tf.Variable(tf.random_normal([conv_result_size, n_hidden], stddev = 0.1)),
'output' : tf.Variable(tf.random_normal([n_hidden, n_output], stddev = 0.1))
}
# bias size: [output_ch] or [neuron_size]
biases = {
'conv1' : tf.Variable(tf.random_normal([k1_ch], stddev = 0.1)),
'conv2' : tf.Variable(tf.random_normal([k2_ch], stddev = 0.1)),
'hidden' : tf.Variable(tf.random_normal([n_hidden], stddev = 0.1)),
'output' : tf.Variable(tf.random_normal([n_output], stddev = 0.1))
}
# input layer: [batch_size, image_height, image_width, channels]
# output layer: [batch_size, class_size]
x = tf.placeholder(tf.float32, [None, input_h, input_w, input_ch])
y = tf.placeholder(tf.float32, [None, n_output])
First, the layer performs several convolutions to produce a set of linear activations
tf.nn.conv2d(input, filter, strides, padding)
input = tensor of shape [None, input_h, input_w, input_ch]
filter = tensor of shape [k_h, k_w, input_ch, output_ch]
strides = [1, s_h, s_w, 1]
padding = 'SAME'
filter size
stride
padding
'SAME' : zero paddingSecond, each linear activation is running through a nonlinear activation function
Third, use a pooling to modify the output of the layer further
tf.nn.max_pool(value, ksize, strides, padding)
value = tensor of shape [None, input_h, input_w, input_ch]
ksize = [1, p_h, p_w, 1]
strides = [1, p_h, p_w, 1]
padding = 'VALID'
ksize
strides
padding
'VALID' : No paddingDense (fully connected) layer
Input is typically in a form of flattened features
Then, apply softmax to multiclass classification problems
The output of the softmax function is equivalent to a categorical probability distribution, it tells you the probability that any of the classes are true.
# [batch, height, width, channels]
def net(x, weights, biases):
# First convolution layer
conv1 = tf.nn.conv2d(x,
weights['conv1'],
strides = [1, 1, 1, 1],
padding = 'SAME')
conv1 = tf.nn.relu(tf.add(conv1, biases['conv1']))
maxp1 = tf.nn.max_pool(conv1,
ksize = [1, p1_h, p1_w, 1],
strides = [1, p1_h, p1_w, 1],
padding = 'VALID')
# Second convolution layer
conv2 = tf.nn.conv2d(maxp1,
weights['conv2'],
strides = [1, 1, 1, 1],
padding = 'SAME')
conv2 = tf.nn.relu(tf.add(conv2, biases['conv2']))
maxp2 = tf.nn.max_pool(conv2,
ksize = [1, p2_h, p2_w, 1],
strides = [1, p2_h, p2_w, 1],
padding = 'VALID')
maxp2_flatten = tf.reshape(maxp2, [-1, conv_result_size])
# Fully connected
hidden = tf.add(tf.matmul(maxp2_flatten, weights['hidden']), biases['hidden'])
hidden = tf.nn.relu(hidden)
output = tf.add(tf.matmul(hidden, weights['output']), biases['output'])
return output
Loss
Optimizer
LR = 0.0001
pred = net(x, weights, biases)
loss = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = pred)
loss = tf.reduce_mean(loss)
optm = tf.train.AdamOptimizer(LR).minimize(loss)
Define hyper parameters for training CNN
n_batch : mini-batch size for stochastic gradient descentn_iter : the number of training stepsn_prt : print loss for every n_prt iterationn_batch = 50
n_iter = 2500
n_prt = 250
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
loss_record_train = []
loss_record_test = []
for epoch in range(n_iter):
train_x, train_y = mnist.train.next_batch(n_batch)
train_x = np.reshape(train_x, [-1, input_h, input_w, input_ch])
sess.run(optm, feed_dict = {x: train_x, y: train_y})
if epoch % n_prt == 0:
test_x, test_y = mnist.test.next_batch(n_batch)
test_x = np.reshape(test_x, [-1, input_h, input_w, input_ch])
c1 = sess.run(loss, feed_dict = {x: train_x, y: train_y})
c2 = sess.run(loss, feed_dict = {x: test_x, y: test_y})
loss_record_train.append(c1)
loss_record_test.append(c2)
print ("Iter : {}".format(epoch))
print ("Cost : {}".format(c1))
plt.figure(figsize = (10,8))
plt.plot(np.arange(len(loss_record_train))*n_prt, loss_record_train, label = 'train')
plt.plot(np.arange(len(loss_record_test))*n_prt, loss_record_test, label = 'test')
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.legend(fontsize = 12)
plt.ylim([0, np.max(loss_record_train)])
plt.show()
test_x, test_y = mnist.test.next_batch(100)
my_pred = sess.run(pred, feed_dict = {x: test_x.reshape(-1, 28, 28, 1)})
my_pred = np.argmax(my_pred, axis = 1)
labels = np.argmax(test_y, axis = 1)
accr = np.mean(np.equal(my_pred, labels))
print("Accuracy : {}%".format(accr*100))
test_x, test_y = mnist.test.next_batch(1)
logits = sess.run(tf.nn.softmax(pred), feed_dict = {x: test_x.reshape(-1, 28, 28, 1)})
predict = np.argmax(logits)
plt.figure(figsize = (12,5))
plt.subplot(1,2,1)
plt.imshow(test_x.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(logits.ravel())
plt.show()
np.set_printoptions(precision = 2, suppress = True)
print('Prediction : {}'.format(predict))
print('Probability : {}'.format(logits.ravel()))
%%html
<center><iframe src="https://www.youtube.com/embed/baPLXhjslL8?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
input_h = 28
input_w = 28
input_ch = 1
k1_h = 3
k1_w = 3
k1_ch = 32
p1_h = 2
p1_w = 2
k2_h = 3
k2_w = 3
k2_ch = 64
p2_h = 2
p2_w = 2
conv_result_size = int(input_h/(p1_h*p2_h)) * int(input_w/(p1_w*p2_w)) * k2_ch
n_hidden = 100
n_output = 10
x = tf.placeholder(tf.float32, [None, input_h, input_w, input_ch])
y = tf.placeholder(tf.float32, [None, n_output])
def net(x):
## First convolution layer
conv1 = tf.layers.conv2d(inputs = x,
filters = 32,
kernel_size = [3, 3],
padding = "SAME",
activation = tf.nn.relu,
kernel_initializer = tf.initializers.random_normal())
maxp1 = tf.layers.max_pooling2d(inputs = conv1,
pool_size = [2, 2],
strides = 2)
## Second convolution layer
conv2 = tf.layers.conv2d(inputs = maxp1,
filters = 64,
kernel_size = [3, 3],
padding = "SAME",
activation = tf.nn.relu,
kernel_initializer = tf.initializers.random_normal())
maxp2 = tf.layers.max_pooling2d(inputs = conv2,
pool_size = [2, 2],
strides = 2)
maxp2_re = tf.reshape(maxp2, [-1, conv_result_size])
### Fully connected (= dense connected)
hidden = tf.layers.dense(inputs = maxp2_re,
units = n_hidden,
activation = tf.nn.relu)
output = tf.layers.dense(inputs = hidden,
units = n_output)
return output
LR = 0.0001
pred = net(x)
loss = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = pred)
loss = tf.reduce_mean(loss)
optm = tf.train.AdamOptimizer(LR).minimize(loss)
n_batch = 50
n_iter = 2500
n_prt = 250
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
loss_record_train = []
loss_record_test = []
for epoch in range(n_iter):
train_x, train_y = mnist.train.next_batch(n_batch)
train_x = np.reshape(train_x, [-1, input_h, input_w, input_ch])
sess.run(optm, feed_dict = {x: train_x, y: train_y})
if epoch % n_prt == 0:
test_x, test_y = mnist.test.next_batch(n_batch)
test_x = np.reshape(test_x, [-1, input_h, input_w, input_ch])
c1 = sess.run(loss, feed_dict = {x: train_x, y: train_y})
c2 = sess.run(loss, feed_dict = {x: test_x, y: test_y})
loss_record_train.append(c1)
loss_record_test.append(c2)
print ("Iter : {}".format(epoch))
print ("Cost : {}".format(c1))
plt.figure(figsize = (10,8))
plt.plot(np.arange(len(loss_record_train))*n_prt, loss_record_train, label = 'train')
plt.plot(np.arange(len(loss_record_test))*n_prt, loss_record_test, label = 'test')
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.legend(fontsize = 12)
plt.ylim([0, np.max(loss_record_train)])
plt.show()
test_x, test_y = mnist.test.next_batch(100)
my_pred = sess.run(pred, feed_dict = {x: test_x.reshape(-1, 28, 28, 1)})
my_pred = np.argmax(my_pred, axis = 1)
labels = np.argmax(test_y, axis = 1)
accr = np.mean(np.equal(my_pred, labels))
print("Accuracy : {}%".format(accr*100))
test_x, test_y = mnist.test.next_batch(1)
logits = sess.run(tf.nn.softmax(pred), feed_dict = {x: test_x.reshape(-1, 28, 28, 1)})
predict = np.argmax(logits)
plt.figure(figsize = (12,5))
plt.subplot(1,2,1)
plt.imshow(test_x.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(logits.ravel())
plt.show()
np.set_printoptions(precision = 2, suppress = True)
print('Prediction : {}'.format(predict))
print('Probability : {}'.format(logits.ravel()))
%%html
<center><iframe src="https://www.youtube.com/embed/3JQ3hYko51Y?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/FTr3n7uBIuE?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
CS231n Convolutional Neural Networks for Visual Recognition at Stanford Univ.
%%html
<center><iframe src="https://www.youtube.com/embed/LxfUGhug-iQ?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/bNb2fEVKeEo?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/NVH8EYPHi30?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')